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Abstract 



Wc have developed a coarse-grained formulation for modeling the dynamic behavior of cells 
quantitatively, based on stochasticity and heterogeneity, rather than on biochemical reactions. We 
treat each reaction as a continuous-time stochastic process, while reducing each biochemical quan- 
tity to a binary value at the level of individual cells. The system can be analytically represented 
by a finite set of ordinary linear differential equations, which provides a continuous time course 
O"^. prediction of each molecular state. In this letter, we introduce our formalism and demonstrate it 

with several examples. 



1 Introduction 

With the rapid growth in molecular biology and its related fields, there is a need for a quantitative 
theoretical framework [T] that can both integrate biological knowledge and provide useful predictions 
to guide further experiments. The most standard approach for describing biological phenomena at 
the molecular level is through the use of reaction equations, which originate from chemistry. If the 
rates, stoichiometrics, and initial conditions of each species are known, one can in principle model 
arbitrarily complex processes inside a cell. Even without such detailed information, this approach is 
still very advantageous if the system is approximately static, most of the stoichiometries and conserved 
quantities are known, and a suitable objective function can be defined, as in metabolic networks [2]. 
On the other hand, for systems without these conditions, such as signal transduction networks, precise 
modeling of biological reactions remains challenging. In practice, there are three major obstacles to 
this approach. The first results from incompleteness of our knowledge at a molecular level: Accurate 
modeling requires correct stoichiometries and reaction constants, not to mention knowledge of all 
the reactions involved in a particular reaction network. With very few exceptions [3], we lack such 
knowledge. The second is the inherent heterogeneity of living cells: Molecular biology experiments 
are generally performed using millions of cells at a single time point; it is not clear whether measured 
values reflect the state of a typical cell or an average of distributions of cells whose states are largely 
different from each other [U [5] . The third is the difficulty in obtaining absolute concentrations from 
such experiments: Most standard biochemical experiments measure only relative concentrations of 
molecules with respect to a standard; converting these relative values into absolute concentrations 
requires further experiments. 

The Boolean approach [6], in which all the states of molecules are represented by True of False 
binary values and Boolean algebra is used to define relationships between molecules, is another 
popular approach to circumvent some of the above obstacles. Despite its simplicity, the Boolean 
approach has been successfully applied to biological systems [7]. However, the price we pay for these 
simplifications is that models largely remain qualitative. 

On the other hand, recent studies have revealed that the stochastic and heterogeneous nature of 
cells is indispensable to understanding the design of the cellular dynamics (HI [9]. In this paper, we 
propose a stochastic generalization of the Boolean approach as an alternative to biochemical reaction 
equations. In the following, we shall see how this formulation enables us to describe the dynamics of 
cellular systems while circumventing the above problems on biochemical reaction equations. Although 
this formulation would be applicable to various biological phenomena, we explain it in application to 
signal transduction or gene regulation. 

2 Formulation and Examples 

It is convenient to represent such a system by a regulatory network diagram of biochemical species 
(nodes) and interactions (edges) between them. We use the term "level" to represent the value 
associated with each node, which may represent concentrations of molecules, expression levels of 
genes, enzyme activities and so on. While the deterministic biochemical reaction models implicitly 
assume that the observed levels reflect the concentration of the molecular species in a typical cell, 
we assume that experimental observations on large numbers of cells reflect averages of distributions 
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Figure 1: A schematic representation of two possible approximations for interpreting bulk assays. 
(A) A typical output of a bulk time course experiment. (B) Cells behave stochastically and digitally, 
as assumed in our formulation. (C) Every cell behaves coherently and continuously, as assumed in 
the deterministic biochemical reaction modeling. 

of cells whose states can vary significantly (FIG. [T|). Though, in reality, each cell would take various 
values for each node, we approximate them by a binary value. True (T) or False (F), as in the Boolean 
approach. Thus, though there are several similarities to earlier attempts to introduce randomness 
into Boolean models [10], our approach is intrinsically related to the heterogeneous nature of cells. We 
define interactions between the states of individual molecules by stochastic processes in continuous 
time. Here, in order to facilitate mathematical analysis, we make the assumption that they satisfy 
the Markov property. 



2.1 Simple systems with two nodes 

To illustrate how our method works, let us first consider the simplest case where an interaction is 
defined between two single nodes (FIG. [2]A.). The time-scale parameter r on the edge represents the 
typical time-scale of the interaction. Here, the arrow from node A to node B means "if A is active, 
B will be activated at the stochastic rate of l/r". We may represent the same information by the 
following equation: 

A^B. (1) 

One can perform Monte Carlo simulation using methods such as the Gillespie algorithm |llj to realize 
the dynamics as in FIG. [5J Here, the term "cells" represents the number of independent simulations. 
While each cell behaves digitally, averaging over multiple cells gives more smooth and deterministic 
behavior. Indeed, in the limit of the cell number approaching infinity, this system can be analytically 
represented by the so-called master (or Kolmogorov) equation: 
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where Pij{t) for i,j € {T, F} represents the probability of a cell taking the state oi {A = i^B = j) at 
time t, and the matrix appearing in the right hand side is the transition rate matrix of the system. 



The dynamics of the system is fully encoded in the solution: 
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Now, imagine that we are measuring the level of S by a bulk assay. Then the observed level should 
be linearly related to the number of cells with B = T, in other words, the marginal probabilities of 
B = T, 

P^T{t) = PFTit) + PTTit) = (1 - exp 



Ptf(0) + P*t(0). 



(7) 



Let us assume It {If) is the observed level when all the cells at the measurement are in the activated 
(inactivated) state, respectively. Then, our prediction for the level of B would be given by 



lB{t) = hP.Tit) + If{1 - P*T{t)) = {It - lF)P*T{t) + If. 



(8) 



Thus, the prediction for each level is given by a linear transformation of the corresponding marginal 
probability. Note that, only the two parameters It and Ip for each node are responsible for the scale 
of the level of the node. This is in contrast to biochemical reaction equations where a change in the 
scale of each node non-linearly affects the scales and dynamics of all the other nodes. For observa- 
tions of relative values with respect to a standard, these scaling parameters can be conveniently used 
for absorbing the missing information. In the rest of this paper, the translation from probability 
to observed data is implicit and not mentioned. Note also that in our formulation, although we 
approximate the possible states of each node by a binary value as in Boolean models, the model 
produces continuous time course data on the levels of the molecules because the final prediction is 
obtained by averaging over the population. 
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Figure 2: Examples of systems with two nodes. (A) A system with an interaction. The initial 
condition is Ptf = 1- The black dashed line in the plot shows the exact solution of the master 
equation (r = 1), while the others are the result of Monte-Carlo simulations performed 1, 10 and 100 
times, respectively. (B) A bistable system and the corresponding plots (ti = r2 = 1), as in (A). 



The second example (FIG. [2j3) is a mutual inhibition: 
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which is expected to show bistability [12]. Here the symbol "!" in front of the nodes in the right 
hand side indicates that the node will be inactivated, rather than activated, at the indicated rate. 
The master equation for the system is given by 
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and the solution with the initial condition of Ptt = 1 is 
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In sharp contrast to the solution of the deterministic biochemical reaction equations, where only one 
of two stable states is taken depending on the initial condition, our formulation naturally describes 
the situation where a portion of the cell population flows to one state and the rest goes to the 
other. This is because our formulation takes the heterogeneity of cell culture into account from the 
beginning. 



2.2 A simple oscillatory system 

It is also interesting to consider how this formulation describes oscillatory behavior, which is prevalent 
in many biological systems, including cell cycle, circadian rhythm or calcium signaling. A prototype 
for such oscillatory systems in Boolean models might be the following negative feedback system with 
two nodes [13]: 

(15) 
(16) 
(17) 
(18) 

Here the symbol "!" in front of the nodes in the left hand side indicates that the interaction occurs 
when the corresponding node is not activated. This system itself oscillates. However, it is not obvious 
whether the oscillation is observed at the population level or not. Assuming a common time-scale 



A - 


-^ B, 


B - 


-^ \A 


\A - 


-^ IB 


\B - 


-^ A. 



parameter r for each interaction, the master equation for this system is given by 
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and the marginal probabilities are found to be 
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under the initial condition of Ptf = 1- While the presence of the trigonometric functions indicates 
some oscillatory behavior, they are multiplied by a rapidly decaying factor e ^ . Therefore, on the 
population level, the oscillation is quickly extinguished due to decoherence between cells. In par- 
ticular, under the assumption of the Markov property, where the timing of the processes follows an 
exponential distribution, the standard deviation is as large as the mean timqj r and the decoherence 
occurs very quickly. Thus, even though each cell behaves in complicated ways, these independent 
behaviors are averaged at the population level and not observed in the final prediction. This consid- 
eration also provides an intuitive interpretation of the linearity of our formalism. While it is widely 
believed that biological systems are full of non-linearities, the master equation, whose solutions are 
related to observed values in this formulation, is always linear and robust, and shows neither chaotic 
nor divergent behavior in any parameter region. 



2.3 Boolean algebra 

For more complicated systems beyond two nodes, the formulation is quite parallel, though the number 
of possible states becomes large for systems with many nodes. The only new ingredients are the 
variety of possible interactions. With multiple nodes, one can consider interactions which depend on 
a set of input nodes. For example, one may introduce the "and" operation: 
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Here, this equation means "if both A and B are active, C will be activated at the stochastic rate of 
1/r" and one can easily write down the corresponding master equation. In a similar way, arbitrary 
Boolean operation can be adopted. 



2.4 A demonstration: TNF-NF-kB system 

As discussed above, the aim of our formulation is to model dynamic cellular systems in a coarse- 
grained way. As a proof of concept, we next consider a signaling pathway which has previously been 



*Note also that one can achieve smaller standard deviations by connecting multiple nodes in series while keeping 
the same total time-scales. 
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Figure 3: A model of the TNF-NF-kB system. (A) The network diagram. The number on each edge 
is the time-scale parameter (min) for the interaction. The small circle at the end of the arrow from 
IkB to NF-kB indicates that the interaction occurs when the IkB is inactivated. (B) The predicted 
time courses of our model for WT. (C) The change of the NF-kB activation level for WT, IkB KO 
and A20 KO. 



studied by biochemical reaction equations. Werner et al. modeled the TNF-NF-kB system using 
biochemical reaction equations [31 [H]. TNF-NF-kB pathway is an important signaling pathway in 
immune cells, which can be activated by the cytokine TNF, and negatively regulated by IkB and 
A20. The biochemical reaction equation model consisted of 33 molecular species and 110 kinetic 
coefficients. In contrast, as shown in FIG. [31^, our model consists of only 6 nodes and 13 time-scale 
parameterqll. Before stimulation, the system is in a steady state, with the transcription factor NF-kB 
inactivated by the inhibitor protein IkB. Once the system is stimulated by the ligand TNF, the TNFR 
signaling complex is formed. This complex formation leads to activation of the kinase IKK. IKK then 
phosphorylates IkB, leading to its degradation. Since IkB is an inhibitor of NF-kB, IkB degradation 
results in NF-kB activation. NF-kB then upregulates the transcription of various genes, including 
IkB and A20. IkB, in turn, inhibits NF-kB and A20 inhibits the TNFR complex. As shown in FIG. 
[3jB, our model predicted the time-dependent levels of each species in the network. The authors of 
[14j examined their model assuming wild-type (WT) conditions, IkB deficient conditions, (IkB KO), 
and A20 deficient conditions (A20 KO): In the absence of IkB, NF-kB activation overshoots, whereas 
A20 deficiency leads to delayed adaptation of NF-kB [2]. In order to validate our model, we also 

45 

suppressed IkB induction (deleting "NF-kB — > IkB"; IkB KO) and A20 induction (deleting "NF-kB 

45 I— 1^>. 

— y A20"; A20 KO). As shown in FIG.[3p, the resulting time-course levels of NF-kB reproduced the 
experimentally observed phenotypes. Thus, although our formulation does not depend on the details 
of the biochemical reactions, it still provides an alternative method for modeling signaling pathways. 



'See Appendix for details of the system and parameters. 



3 Summary and Discussion 

In this paper, we have presented a novel formulation of cellular processes at the population level, 
based on a discrete and stochastic description at the single cell level. Though most of the examples 
here are simple toy models, our motivation is toward application to more realistic and complicated 
systems of cells such as signal transduction and transcriptional regulatory networks, as demonstrated 
in the final example. To this end we constructed a model of the TNF-NF-kB system using only the 
coarse-grained topology and typical time-scale parameters of the system. The fact that our models 
were built without fine-tuning parameters indicates that the overall formulation is rather robust, 
independently of the details of the underlying biochemical reactions and absolute values of molecular 
concentrations. In contrast, modeling such a system using biochemical reaction equations requires 
detailed information of the system, including the absolute concentration of molecules, stoichiometries 
for each reaction, and a large number of biochemical reaction constants. In our formulation, diagrams 
such as Fig.[3]A. encode all the dynamic information and provide an intuitive view of the system. Thus, 
our formulation can be more rapidly implemented, and is both robust and intuitive as compared 
with conventional biochemical reaction networks. Clearly the validity of the approximations used 
here (FIG. [1]) is expected to be system-dependent and must be investigated further. Nevertheless, 
the linear and analytic nature of the formulation enables the use of several theoretical tools from 
various disciplines. In addition, it is also interesting to note that intercellular interactions can be 
simultaneously implemented by letting the transition rate matrix depend on the probabilities of the 
nodes responsible for the intercellular activities. Such a unified description of both intra- and inter- 
cellular interaction might help us to understand cell design in multicellular organisms, though it is 
beyond the scope of this letter. We believe that this coarse-grained formulation of cellular systems 
provides a strong framework with which to integrate vast knowledge from molecular biology. 
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Appendix: The TNF-NF-/€B model 

Tumor necrosis factor (TNF) is a soluble protein (cytokine). TNF is known to be involved in various 
biological processes such as apoptosis, inflammation, and immune responses |15j . Its intracellular 
signaling mechanism has been extensively studied [16]. Several studies have built models of this 
system on the basis of biochemical reaction equations (reviewed in |17j ) , including that of Werner et 
al |14j which we used for comparison with our formulation in this letter. 

We have assumed the topology of the system as shown in FIG. 3A in the main text. This topology 
is much simpler than the biochemical reaction network diagram in [T3], but is generally consistent 
with the current understanding of the TNF-NF-kB pathway. The parameters needed for our model- 
ing are the typical time-scales for each regulation. We have adopted the appropriate parameters for 
rate-limiting steps in the data provided in |14] and references therein as typical time-scale parameters 
for our model. 
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Table 1: Equations and parameters of the TNF model 

The units of the above time-scale parameters are minutes. There were five time-scale parameters 
which could not be obtained from the data provided in the references. Though we could numerically fit 
them from the experimental data, we chose to assume reasonable values for purpose of demonstration. 
For example, the time-scales for inductions of IkB and A20 by NF-kB were set to 45 min and the 
remaining time-scales were set to sum to roughly 10 min because the inductions of IkB and A20 
require approximately Ih. 
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